Tools for Large Graph Mining

نویسنده

  • Deepayan Chakrabarti
چکیده

Graphs show up in a surprisingly diverse set of disciplines, ranging from computer networks to sociology, biology, ecology and many more. How do such “normal” graphs look like? How can we spot abnormal subgraphs within them? Which nodes/edges are “suspicious?” How does a virus spread over a graph? Answering these questions is vital for outlier detection (such as terrorist cells, money laundering rings), forecasting, simulations (how well will a new protocol work on a realistic computer network?), immunization campaigns and many other applications. We attempt to answer these questions in two parts. First, we answer questions targeted at applications: what patterns/properties of a graph are important for solving specific problems? Here, we investigate the propagation behavior of a computer virus over a network, and find a simple formula for the epidemic threshold (beyond which any viral outbreak might become an epidemic). We find an “information survival threshold” which determines whether, in a sensor or P2P network with failing nodes and links, a piece of information will survive or not. We also develop a scalable, parameter-free method for finding groups of “similar” nodes in a graph, corresponding to homogeneous regions (or CrossAssociations) in the binary adjacency matrix of the graph. This can help navigate the structure of the graph, and find un-obvious patterns. In the second part of our work, we investigate recurring patterns in real-world graphs, to gain a deeper understanding of their structure. This leads to the development of the R-MAT model of graph generation for creating synthetic but “realistic” graphs, which match many of the patterns found in real-world graphs, including powerlaw and lognormal degree distributions, small diameter and “community” effects.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Discovery of Technology Networks for Industrial-Scale R&D IT Projects via Data Mining

Industrial-Scale R&D IT Projects depend on many sub-technologies which need to be understood and have their risks analysed before the project can begin for their success. When planning such an industrial-scale project, the list of technologies and the associations of these technologies with each other is often complex and form a network. Discovery of this network of technologies is time consumi...

متن کامل

Mining Billion-Scale Graphs: Patterns and Algorithms

Graphs are everywhere: social networks, the World Wide Web, biological networks, and many more. The sizes of graphs are growing at unprecedented rate, spanning millions and billions of nodes and edges. What are the patterns in large graphs, spanning Giga, Tera, and heading toward Peta bytes? What are the best tools, and how can they help us solve graph mining problems? How do we scale up algori...

متن کامل

A Proposed Data Mining Methodology and its Application to Industrial Procedures

Data mining is the process of discovering correlations, patterns, trends or relationships by searching through a large amount of data stored in repositories, corporate databases, and data warehouses. Industrial procedures with the help of engineers, managers, and other specialists, comprise a broad field and have many tools and techniques in their problem-solving arsenal. The purpose of this st...

متن کامل

A Hybrid Meta-heuristic Approach to Cope with State Space Explosion in Model Checking Technique for Deadlock Freeness

Model checking is an automatic technique for software verification through which all reachable states are generated from an initial state to finding errors and desirable patterns. In the model checking approach, the behavior and structure of system should be modeled. Graph transformation system is a graphical formal modeling language to specify and model the system. However, modeling of large s...

متن کامل

بررسی کاربردهای داده کاوی در نظام سلامت

Introduction: Extensive amounts of data stored in medical databases require the development of specialized tools for accessing the data, data analysis, knowledge discovery, and the effective use of the data. Data mining is one of the most important methods. The article sketches the used Data Mining techniques, and illustrates their applicability to medical diagnostic and prognostic problems. ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005